389 research outputs found
Arguing Machines: Human Supervision of Black Box AI Systems That Make Life-Critical Decisions
We consider the paradigm of a black box AI system that makes life-critical
decisions. We propose an "arguing machines" framework that pairs the primary AI
system with a secondary one that is independently trained to perform the same
task. We show that disagreement between the two systems, without any knowledge
of underlying system design or operation, is sufficient to arbitrarily improve
the accuracy of the overall decision pipeline given human supervision over
disagreements. We demonstrate this system in two applications: (1) an
illustrative example of image classification and (2) on large-scale real-world
semi-autonomous driving data. For the first application, we apply this
framework to image classification achieving a reduction from 8.0% to 2.8% top-5
error on ImageNet. For the second application, we apply this framework to Tesla
Autopilot and demonstrate the ability to predict 90.4% of system disengagements
that were labeled by human annotators as challenging and needing human
supervision
An Initial Assessment of the Significance of Task Pacing on Self-Report and Physiological Measures of Workload While Driving
In block A of a simulator study, a sample of 38 drivers showed a stepwise increase in heart rate and skin conductance level (SCL) from single task driving and across 3 levels of an auditory presentation – verbal response dual task (n-back), replicating findings from on-road research. Subjective ratings showed a similar stepwise increase, establishing concurrent validity of the physiological indices as measures of workload. In block B, varying the inter-stimulus interval in the intermediate 1-back level of the task resulted in a pattern across self-report workload ratings, heart rate, and SCL suggesting that task pacing may influence effective workload. Further consideration of the impact of task pacing in auditoryverbal in-vehicle applications is indicated
A Comparison of Heart Rate and Heart Rate Variability Indices in Distinguishing Single-Task Driving and Driving Under Secondary Cognitive Workload
Heart rate and heart rate variability (HRV) measures collected under actual highway driving from 25 young adults were compared to assess the relative sensitivity of each for distinguishing between a period of single task driving and periods of low and high additional cognitive workload. Basic heart rate, skin conductance and most, but not all, of the HRV indices were significantly different between single task driving and the high secondary demand period. Heart rate and skin conductance were also robust at distinguishing between single task driving and the low added demand period; however, several HRV measures did not show statistically significant differences between these two periods and the remaining HRV measures that did were less robust than basic heart rate as assessed by effect size and observed power. Rather than attempting to argue for the inherent superiority of any one physiological measure, these findings are presented with the intent of encouraging a broader discussion around the conditions under which particular physiological measures may be most useful and/or complementary for detecting different aspects of workload and operator state
Toward an Antiphony Framework for Dividing Tasks into Subtasks
Task analysis is a staple of ergonomics, neuroergonomics, human factors, and experimental psychology inquiry, and often benefits from granularity beyond the task level to the subtask level. The concept and challenge of identifying the subcomponents of tasks are neither new, nor solved. Practitioners routinely identify individually internally consistent and yet conflicting subdivisions. The challenge of producing reliable, valid subtask data across efforts recommends a unified framework for identifying consistent subtask divisions within tasks. A framework is here forwarded, based upon universal “antiphony” turn-taking behavior in human-human interaction, but adapted to address the highly scripted vocabulary of human-machine interaction. Practical application to a real-world vehicle interface is demonstrated, an example discussed in the light of research design, applied use, and future improvement
A Field Study Assessing Driving Performance, Visual Attention, Heart Rate and Subjective Ratings in Response to Two Types of Cognitive Workload
In an on-road experiment, driving performance, visual attention, heart rate and subjective ratings of workload were evaluated in response to a working memory (n-back) and a visual-spatial (clock) task. Subjective workload ratings for the two types of tasks did not statistically differ, suggesting a similar level of overall workload. Gaze concentration and heart rate showed significant changes relative to single task driving during the extra tasks and the magnitude of change was similar for both, while driving performance measures were not sensitive to the increase in workload. The results suggest high sensitivity of both gaze dispersion and heart rate as measures of workload across these two different types of cognitive demand
- …